Input Validation And Sanitization
This document details the input validation and sanitization strategies implemented across the application. It focuses on:
Phone number normalization and validation
Email address extraction and filtering
User-provided content sanitization for messages
Security measures against malicious file uploads, CSV parsing risks, and command injection attempts
Input encoding strategies, escape sequence handling, and data integrity verification
The analysis covers both Electron main process handlers and Python backend utilities, ensuring a comprehensive understanding of how user inputs are processed, validated, sanitized, and transmitted securely.
The application comprises:
Electron main process handlers for WhatsApp, Gmail, and SMTP operations
Frontend React components for user interaction
Python utilities for phone number cleaning and contact extraction
A Flask backend for file upload and user management
Diagram sources
Section sources
This section outlines the primary validation and sanitization mechanisms implemented in the codebase.
Phone number cleaning and normalization
Removes separators and non-digit characters except plus sign
Enforces length constraints and optional international prefix
Standardizes local numbers to international format when applicable
Manual phone number parsing
Accepts multiple formats: standalone numbers, name:number pairs, and delimiter-separated entries
Uses regex heuristics to detect phone-like substrings
Produces normalized contacts with optional names
Contact extraction from files
Supports CSV, TXT, and Excel formats
Heuristic detection of phone and name columns
Robust fallbacks and error handling for malformed inputs
Email list parsing
Reads CSV with flexible column names or plain text newline-separated entries
Filters entries containing “@” to approximate valid email addresses
Message sanitization
Limits message lengths for safety and performance
Encodes HTML content appropriately for transport
Avoids unsafe inline styles or scripts in HTML messages
File upload restrictions
Whitelists allowed file extensions
Uses secure filename generation
Stores uploads under controlled paths
Section sources
The validation pipeline spans frontend, Electron main process, and Python utilities:
Diagram sources
Phone Number Validation and Normalization#
Phone numbers undergo strict cleaning and normalization:
Strips whitespace and common separators
Removes non-digit characters except “+”
Handles leading zeros and optional international prefixes
Validates digit count within accepted bounds
Diagram sources
Section sources
Manual Phone Number Parsing#
The manual parser supports flexible input formats:
Standalone numbers
Name-number pairs separated by colon or dash
Delimiter-separated entries (newline, comma, semicolon, pipe)
Heuristic detection of phone-like substrings
Part 2 as number"] PartsCount --> |No| TreatAsSingle["Treat as single entry"] TryName --> DetectPhone["Detect phone-like substring"] TreatAsSingle --> DetectPhone DetectPhone --> Clean["clean_phone_number()"] Clean --> Valid{"Valid number?"} Valid --> |Yes| BuildContact["Build contact {number,name}"] Valid --> |No| NextLine BuildContact --> Collect["Add to contacts"] Collect --> NextLine NextLine --> Done["Return contacts"]
Diagram sources
Section sources
Contact Extraction from Files#
File-based contact extraction supports multiple formats:
CSV: heuristic column detection for phone/name; robust fallbacks
TXT: delimiter-separated lines with optional name
Excel: pandas-based parsing with similar heuristics
Diagram sources
Section sources
Email Address Parsing and Filtering#
Email lists are parsed from CSV or plain text:
CSV: flexible column names (email, Email, ADDRESS, etc.) or first column fallback
Text: newline-separated entries filtered by presence of “@”
Transport encoding: HTML content-type header included
Diagram sources
Section sources
Message Content Sanitization#
Message composition includes:
Length limits for performance and platform constraints
HTML content-type header for Gmail transport
Optional HTML stripping for text version in SMTP
Diagram sources
Section sources
File Upload Security Measures#
The Flask backend enforces:
Allowed file extensions whitelist
Secure filename generation
Controlled upload path
JSON responses for API endpoints
Diagram sources
Section sources
Key dependencies and interactions:
Frontend components communicate with Electron main process via contextBridge
Pyodide loads Python scripts dynamically for manual number parsing
Handlers depend on environment variables for external services
File parsing relies on pandas for structured formats
Diagram sources
Section sources
Regex-based cleaning and parsing are efficient for typical contact volumes but should be monitored for very large inputs
File parsing uses streaming for CSV; ensure appropriate buffering and memory limits
Message length limits prevent excessive payload sizes and reduce transport overhead
Rate limiting delays in email sending avoid throttling and improve reliability
Common validation and sanitization issues:
Invalid phone numbers
Cause: Non-digit characters outside “+”, incorrect length
Resolution: Ensure numeric input with optional “+” prefix and correct digit count
Malformed CSV/Excel files
Cause: Missing headers, unexpected delimiters, mixed encodings
Resolution: Validate schema and encoding; provide clear error messages
Email parsing failures
Cause: Missing “@” or unsupported column names
Resolution: Use supported column names or rely on first-column fallback
File upload errors
Cause: Unsupported extension or missing file part
Resolution: Confirm allowed extensions and proper multipart form submission
Section sources
The application implements layered input validation and sanitization:
Phone numbers are rigorously normalized and validated
Manual and file-based contact extraction use robust heuristics and error handling
Email lists are filtered and encoded for secure transport
File uploads are restricted and saved securely
Message content is length-limited and encoded appropriately
These measures collectively mitigate injection risks, maintain data integrity, and ensure reliable operation across diverse input formats.